feat: UAT runner ergonomics + demote fastmcp tool-failure tracebacks#1051
Conversation
- Demote fastmcp tool-validation/FastMCPError tracebacks to one-line WARNING via ToolValidationLogFilter (pending upstream fastmcp PR #4029). - Runner ergonomics: argparse typo suggestions, TTY stdin guard, clearer startup progress, UAT README quick-start, LOG_LEVEL=WARNING default for spawned MCP subprocess. - Share one in-process FastMCP server per agent run (new tests/uat/_inprocess.py) instead of rebuilding per setup/verify phase. Saves ~1.5s per story-phase; ~150s on a 50-story run. - Migrate all UAT scripts from print-based log() to stdlib logging; keep uat.* namespace at INFO while silencing httpx/openai/mcp INFO chatter. - Fix Summary pass/fail: now uses _compute_passed (same logic as JSONL records), so stories that fail ha_checks but exit 0 are correctly marked FAIL. - Drop duplicated client-side validation error echo (server log already shows the failure details).
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request improves the ergonomics, performance, and logging reliability of the UAT runner and story harness. By centralizing logging, optimizing MCP client lifecycle management, and refining CLI usability, these changes make the test suite faster and easier to debug without altering production behavior. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the User Acceptance Testing (UAT) suite to improve logging, error reporting, and resource efficiency. It introduces a ToolValidationLogFilter to demote noisy tracebacks, standardizes on the Python logging module across CLI tools, and implements a shared in-process FastMCP client context to reduce server startup overhead. Feedback focuses on adhering to the project's requirement for type hints in all function signatures and improving the robustness of the story runner by handling setup failures more gracefully to avoid terminating the entire test suite.
run_start was shadowed inside the per-story loop by a variable tracking the test prompt start (used for session file discovery). Rename the inner variable to prompt_start so the Summary's elapsed calculation reflects the whole run.
kingpanther13
left a comment
There was a problem hiding this comment.
Thanks for the UAT cleanup — the shared MCP client savings, the _compute_passed consolidation, and the unified uat.* logger namespace are all solid wins. A few items worth addressing before merge:
Important
-
ToolValidationLogFilteris wider than the stated intent (src/ha_mcp/__main__.py:387) —isinstance(err, FastMCPError)also matchesAuthorizationError,ResourceError,PromptError, etc. If the goal is "tool-raised, user-visible error", tightening toisinstance(err, ToolError)would match the docstring exactly and avoid silencing a future auth failure. -
Filter discards
exc_info/exc_textpermanently (__main__.py:394-395) — any downstream handler (Sentry, structured logging, file tail) loses the stack after this filter runs. Consider stashing a one-frame summary intorecord.msginstead of nulling entirely, so minimal debug context survives. -
Commit / PR scope is misleading —
feat(uat):covers a production logging change that affects the live server, not just UAT. Release-note automation keys off these prefixes. Splitting the log filter into its ownfeat(internal): demote fastmcp tool-failure tracebackscommit (or retitling the PR) would keep the changelog honest. -
Shared MCP client is a single point of failure across stories (
tests/uat/stories/run_story.pyaround line 786) — the prior per-call_mcp_contextgave implicit isolation; reusing one client means a story that corrupts the WebSocket can poison every subsequent story in the same agent run. A short try/reconnect guard, or at minimum a docstring note acknowledging the tradeoff, would help future maintainers diagnose intermittent failures. -
inprocess_mcp_clientfinally doesn't resetwebsocket_manager(tests/uat/_inprocess.py:49-58) — env vars and_settingsare restored, but the module-level websocket stays connected to the (possibly-stopped) test container. Considerawait websocket_manager.disconnect()in the finally too, symmetrical with the entry path. -
Env-var mutation is process-global (
_inprocess.py:37-42) — safe under current sequential usage, but worth an explicit docstring line calling out "not safe for concurrent use" so nobody later adds pytest-xdist or parametrized fixtures and hits races. -
_run_mcp_stepsteardown swallows bareExceptionat INFO level (run_story.pyaround lines 286-290) — a broken websocket during teardown is logged "failed, ignored" at INFO and silently poisons the next story's setup via the shared client.logger.warning(orlogger.exception) would be more honest, and the catch could be narrowed toToolError/ expected transport errors.
Nice-to-have
-
Test coverage for
ToolValidationLogFilter— the siblingStatelessSessionLogFilterhastests/src/unit/test_stateless_session_log_filter.pywith five targeted cases. Cloning that template for the new filter would lock in the behavior (bareExceptionpasses through, subclass handling,exc_info=Nonepass-through, right-logger/wrong-message, etc.) and guard against a future tightening regression. -
Lazy imports inside
filter()(__main__.py:379-380) — bothfastmcp.exceptionsandpydanticare loaded long before any log record reaches this filter, so hoisting the imports to module scope removes a per-recordsys.moduleslookup in the hot path. Low-impact but free. -
Docstring clarity in
_inprocess.py— "the env-swap and WebSocket disconnect point ha_mcp's module-level settings at the target HA instance" is accurate but a bit opaque. Something like "clearingha_mcp.config._settingsforces the nextget_global_settings()call to re-read env; the websocket disconnect tears down any cached client on the previous URL" would be easier on future readers.
Strengths
- Filter correctness verified: bare
ExceptionandNotFoundErrorretain full tracebacks;pydantic.ValidationErroris identity-equal topydantic_core.ValidationError, so both fastmcp log paths are caught. append_result(passed=False)default is a good failure-closed choice — the priorexit_code=0default is exactly what caused the Summary/JSONL divergence this PR fixes.verify_ha_checks(mcp_client)breaking change is safe (single caller, grep-verified).SuggestingArgumentParser, stdin TTY guard,/v1suffix note in the README, and the per-agentprompt_startvsrun_startsplit (commit 945a4c8) — all genuine UX improvements.
Happy to iterate on any of these.
…ed client - Narrow ToolValidationLogFilter to ToolError (was FastMCPError) - Add websocket_manager.disconnect() in inprocess_mcp_client finally - Log teardown failures as WARNING (was INFO) with shared-client note - Hoist fastmcp/pydantic imports to module scope - Document shared-client SPoF tradeoff and process-global env mutation - Add unit tests for ToolValidationLogFilter (6 cases)
|
Thanks for the review. Addressed in 8869f42:
Not addressing:
|
kingpanther13
left a comment
There was a problem hiding this comment.
Thanks for the quick turnaround. All the code items land cleanly, the new test_tool_validation_log_filter.py is exactly the right shape (the FutureAuthError case nicely locks in that non-ToolError subclasses keep their stacks), and the rewritten _inprocess.py docstring is a lot clearer.
On the exc_info nulling — fair call. With the filter narrowed to ToolError / pydantic ValidationError and the structured detail folded into record.msg, these really are user-input errors rather than bugs needing a Sentry-grade stack. The docstring now makes that intent explicit, which is what I was really after.
LGTM.
🧪 Your changes are now in the dev channel!Your PR has been merged to master and is available for testing in the dev channel. Test your changes before the next stable release (biweekly Wednesday): Quick start# Run dev version
uvx ha-mcp-dev
# Check version
uvx ha-mcp-dev --versionDocker: docker pull ghcr.io/homeassistant-ai/ha-mcp:dev
docker run --rm -i \
-e HOMEASSISTANT_URL=http://your-ha:8123 \
-e HOMEASSISTANT_TOKEN=your_token \
ghcr.io/homeassistant-ai/ha-mcp:devFound an issue? Please open a new bug report and mention this PR for context. |
What does this PR do?
Cleans up the UAT runner and story harness along several axes surfaced while debugging a BAT session. No behavior changes to the production MCP server beyond one log-level downgrade.
Server-side
ToolValidationLogFilterinsrc/ha_mcp/__main__.pydemotes fastmcp'sError validating tool/Error calling tooltracebacks to single-line WARNINGs when the underlying exception is a PydanticValidationErroror aFastMCPError(tool-raised). Real tool bugs (bareException) keep their full stack. This is a local backstop; an upstream fastmcp PR (fix: drop exc_info for expected tool failures, remove unreachable ValidationError PrefectHQ/fastmcp#4029) fixes the same at source.UAT runner ergonomics (
tests/uat/run_uat.py,tests/uat/stories/run_story.py,tests/uat/README.md)SuggestingArgumentParser: typo-tolerant argparse withdifflibsuggestions (e.g.,--agants→ did you mean--agents?).run_uat.pywith no pipe/file fails fast with a helpful message instead of hanging.run_story.py --all.uv run python;/v1suffix requirement called out for LM Studio/Ollama.LOG_LEVEL=WARNINGfor the spawned MCP subprocess (override with--mcp-env LOG_LEVEL=INFO).Shared in-process MCP client (
tests/uat/_inprocess.py)HomeAssistantSmartMCPServer+ FastMCP client per agent run, shared across all stories' setup/verify/teardown. Previously rebuilt per phase per story; construction is ~1.5s, savings ~150s on a 50-story run.verify_ha_checks(mcp_client)now takes the client as a required argument._mcp_contextinverify_story.pywas a duplicate and has been removed.mcp_clientfixture intests/uat/stories/conftest.pyalso uses the shared helper (bonus: it now picks up thewebsocket_manager.disconnect()that the old fixture skipped).Logging migration (
tests/uat/*)def log()helpers (thinprintwrappers) with stdlibloggingmodule usage throughout. Each UAT file uses an explicitlogging.getLogger("uat.<module>")so the namespace level filter works regardless of script vs. module invocation.tests/uat/_logging.py::configure_cli_logging()— single place that sets root=WARNING /uat.*=INFO. Silences httpx/openai/mcp INFO chatter without suppressing our trace.FATAL: ...→logger.critical,ERROR: ...→logger.error/logger.exception, check failures and summary-level story failures →logger.warning.errors.pydantic.devURL footer from client-side validation error echoes (tests/uat/openai_agent.py::_strip_pydantic_url).Summary correctness
Summaryblock inrun_story.pynow uses_compute_passed(the same logic as the JSONL records written viaappend_result). Previously it only considered the test-prompt exit code, so stories that failed ha_checks but exited 0 were marked PASS in the summary but FAIL in the JSONL. These now agree.append_resultsignature changed:exit_code=→passed=(single source of truth, computed once per story).Type of change
Testing
s01_automation_sunset_lightsagainst LM Studio verified: clean WARNING logs, no URL, one server construction, correct FAIL summary from_compute_passed)uv run pytest tests/uat/stories/test_verify_story.py— 12/12)uv run ruff check— clean)Checklist